(19) 





(12) 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets (11) 

EUROPEAN PATENT APPLICATION 




EP 0 782 343 A2 



(43) Date of publication: 

02.07.1997 Bulletin 1997/27 

(21) Application number: 96120920.2 

(22) Date of filing: 27.12.1996 



(51) Int. Cl.^: H04N 7/50, H04N 7/46 



(84) Designated Contracting States: 


(72) Inventor: Tan, Thiow Keng 


DE FR GB IT NL 


11-08, Singapore 2368 (SG) 


(30) Priority: 27.12.1995 J P 340609/95 


(74) Representative: Gruneclter, Kinkeldey, 




Stockmair & Schwanhausser 


(71) Applicant: MATSUSHITA ELECTRIC INDUSTRIAL 


Anwaltssozietat 


CO., LTD. 


lUaximilianstrasse 58 


Kadoma-shi, Osaka 571 (JP) 


80538 MUnchen (DE) 



(54) Video coding method 

(57) A new predictive coding is used to increase the 
temporal frame rate and coding efficiency without intro- 
ducing excessive delay. Currently the motion vector for 
the blocks in the bi-directionally predicted frame is 
derived from the motion vector of the corresponding 
block in the forward predicted frame using a linear 
motion model. 

This however is not effective when the motion in the 
image sequence is not linear. According to this inven- 
tion, the efficiency of this method can be further 
improved if a non-linear motion model is used. In this 
model a delta motion vector is added to or subtracted 
from the derived forward and backward motion vector, 
respectively The encoder performs an additional 
search to determine if there is a need for the delta 
motion vector. The presence of this delta motion vector 
in the transmitted bitstream is signalled to the decoder 
which then takes the appropriate action to make use of 
the delta motion vector to derive the effective forward 
and backward motion vectors for the bi-directionally pre- 
dicted block. 
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Description 

BACKGROUND OF THE INVENTION 

5 1 . Field of the Invention 

This invention can be used in low bit rate video coding for tele-communicative applications. It improves the temporal 

frame rate of the decoder output as well as the overall picture quality. 

10 2. Related art of the Invention 

In a typical hybrid transform coding algorithm such as the ITU-T Recommendation H. 261 [1] and MPEG [2] motion 
compensation is used to reduce the amount of temporal redundancy in the sequence. In the H. 261 coding scheme, the 
frames are coded using only forward prediction, hereafter referred to as P-frames. In the MPEG coding scheme, some 
15 frames are coded using bi-direction prediction, hereafter referred to as B-frames B-frames improve the efficiency of the 
coding scheme. Now the [1] is ITU-T Recommendation H. 261 (Formerly CCITT Recommendation H. 261) Codes for 
audiovisual services at px64 kbit/s Geneva, 1990 , and the [2] is ISO/IEC 1 1 172-2 1993 , Information technology - Cod- 
ing of moving pictures and associated audio for digital storage media at up to about 1 ,5 Mbit/s - Part 2: Video. 

However, it introduces delay in the encoding and decoding, making it unsuitable for applications in the communica- 
tee tive sevices where delay is an important parameter. Figure la and lb illustrates the frame prediction of H. 261 and 
MPEG as described above. A new method of coding involving the coding of the P and B frames as a single unit, here- 
after referred to as the PB-frame, was introduced. In this scheme the blocks in the PB-frames are coded and transmitted 
together thus reducing the total delay. In fact the total delay should not be more than a scheme using forward prediction 
only but at half the frame rate. 

25 Figure 2a shows the PB-frame prediction. A PB-frame consists of two pictures being coded as one unit. The name 
PB comes from the name of picture types in MPEG where there are P-frames and B-frames. Thus a PB-frame consists 
of one P-frame which is predicted from the last decoded P-frame and one B-frame which is predicted both from the last 
decoded P-frame and the P-frame currently being decoded. This last picture is called B-frame because parts of it may 
be bi-directionally predicted from the past and future P-frame. 

30 Figure 2b shows the forward and bi-directional prediction for a block in the B-frame, hereafter referred to as a B- 
block. Only the region that overlaps with the corresponding block in the current P-frame, hereafter referred to as the P- 
block, is bi-directionally predicted. The rest of the B-block is forward predicted from the previous frame. Thus only the 
previous frame is required in the frame store. The information from the P-frame is obtained from the P-block currently 
being decoded. 

35 In the P B-block only the motion vectors for the P-block is transmitted to the decoder. The forward and backward 
motion vectors for the B-block is derived from the P motion vectors. A linear motion model is used and the temporal ref- 
erence of the B and P frame is used to scale the motion vector appropriately. Figure 3a depicts the motion vector scaling 
and the formula is shown below. 

40 MVf=(TRbxMV)/TRp (1) 

MV B=((TR B-TR p) X MV)/TR p (2) 

where 

45 

MV is the motion vector of the P-block, 

MVp and MVb are the forward and backward motion vectors for the B-block, 

TRb is the increment in the temporal reference from the last P-frame to the current B-frame, and 

TRp is the increment in the temporal reference from the last P-frame to the current P-frame. 

50 

Currently the method used in the prior art assumes a linear motion model. However this assumption is not valid in 
a normal scene where the motion is typically not linear. This is especially true when the camera shakes and when 
objects are not moving at constant velocities. 

A second problem involves the quantization and transmission of the residual of the prediction error in the B-block. 
55 Currently the coefficients from the P-block and the B-block are interleaved in some scanning order which requires the 
B-block cefficients to be transmitted even when they are all zero. This is not very efficient as it is quite often that there 
are no residual coefficients to transmit (all coefficients are zero). 
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SUMMARY OF THE INVENTION 

In order to solve the first problem, the current invention employs a delta motion vector to compensate for the non- 
linear motion. Thus it becomes necessary for the encoder to perform an additional motion search to obtain the optimum 
5 delta motion vector that when added to the derived motion vectors would result in the best match in the prediction. This 
delta motion vectors are transmitted to the decoder at the block level only when necessary. A flag is used to indicate to 
the decoder if there are delta motion vectors present for the B-block. 

For the second problem, this invention also uses a flag to indicate if there are coefficients for the B-block to be 
decoded. 

10 The operation of the Invention is described as follows. 

Figure 3a shows the linear motion model used for the derivation of the forward and backward motion vectors from 
the P-block motion vector and the temporal reference information. As illustrated in figure 3b, this model breaks down 
when the motion is not linear. The derived forward and backward motion vector is different from the actual motion vector 
when the motion is not linear. This is especially true when objects in the scene are moving at changing velocities. 

15 In the current invention the problem is solved by adding a small delta motion vector to the derived motion vector to 
compensate for the difference between the derived and true motion vector. Therefore the equations in (1) and (2) are 
now replaced by equations (3) and (4), respectively 

MV f'=(TR b X MV)/TR p + MV p^,,^ (3) 

20 

MV b'=((TR b -TR p) X MV)/TR p - MV p^,^ (4) 

where 

25 MV is the motion vector of the P-block, 
MVpeita 'S the delta motion vector, 

MVp' and MVg' are the new forward and backward motion vectors for the B-block according to the current invention, 
TRb is the increment in the temporal reference from the last P-frame to the current B-frame, and 
TRp is the increment in the temporal reference from the last P-frame to the current P-frame. 

30 

Note: Equations (3) and (4) are used for the motion vector in the horizontal as well as the vertical directions. Thus 
the motion vectors are in pairs and there are actually two independent delta motion vectors, one each for the 
horizontal and vertical directions. 

35 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1a is a prior art which illustrates the prediction mode used in the ITU-T Recommendation H. 261 Standard. 
Figure lb is a prior art which illustrates the prediction mode used in the ISO-IEC/JTC MPEG Standard. 
Figure 2a illustrates the PB-frame prediction mode. 
40 Figure 2b illustrates the B-block bi-directional prediction mode. 
Figure 3a illustrates the linear motion model. 

Figure 3b illustrates the non-linear motion model of the current invention. 
Figure 4 illustrates the encoder functionality block diagram. 
Figure 5 illustrates the B-block bi-directional prediction functionality block diagram. 
45 Figure 6 illustrates the decoder functionaltity block diagram. 

PREFERRED EMBODIMENTS 

The prefen'ed embodiment of the current invention is described here. Figure 4 illustrates the encoding functionality 
50 diagram. The present invention deals with the method for deriving the motion vectors for the B-block. The encoding 
functionality is presented here for completeness of the embodiment. 

The encoding functionality block diagram depicts an encoder using a motion estimation and compensation for 
reducing the temporal redundancy in the sequence to be coded. The input sequences is organized into a first frame and 
pairs of subsequent frames. The first frame, hereafter referred to as the l-frame, is coded independent of all other 
55 frames. The pairs of subsequent frames, hereafter referred to as PB-frame, consists of a B-frame f llowed by a P-frame. 
The P-frame is forward predicted based on the previously reconstructed l-frame or P-frame and the B-frame is bi-direc- 
tionally predicted based on the previously reconstruceted l-frame or P-frame and the information in the current P-frame. 

The input frame image sequence, 1 , is placed in the Frame Memory 2. If the frame is classified as an l-frame or a 
P-frame it is passed through line 1 4 to the Reference Memory 3, for use as the reference frame in the motion estimation 
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of the next PB-frame to be predictively encoded. The signal is then passed through line 1 3 to the Block Sampling mod- 
ule 4, where it is partitioned into spatially non-overlapping blocks of pixel data for further processing. 

If the frame is classified as an l-f rame, the sampled blocks are passed through line 1 6 to the DCT module 7. If the 
frame is classified as a PB-frame, the sampled blocks are passedthrough line 17 to the Motion Estimation module 5. 

5 The Motion Estimation module 5 uses information from the Reference Frame Memory 3 and the current block 17 to 
obtain themotion vector for that provides the best match for the P-block. The motion vector and the local reconstructed 
frame, 12, are passed through line 19 and 20, respectively, to the Motion Compensation module 6. The difference 
Image is formed by subracting the motion compensated decoded frame, 21 , from the current P-block, 15. This signal is 
then passed through line 22 to the DCT module 7. 

10 In the DCT module 7, each block is transformed into the DCT domain coefficients. The transform coefficients are 
passed through line 23 to Quantization module 8, where they are quantized. The quantized coeff icents are then passed 
throgh line 24 to the Run-length & Variable Length Coding module 9. Here the coefficients are entropy coded to from 
the Output Bit Stream, 25. 

If the current block is an l-block or a P-block, the quantized coefficients are also passed through line 26 to the 

15 Inverse Quantization module 10. The output of the Inverse Quantization 10, is then passed through line 27 to the 
Inverse DCT module 1 1 . If the current block is an l-block then the reconstructed block is placed, via line 28, in the Local 
Decoded Frame Memory 12. If the current block is a P-block then the output of the Inverse DCT 29 is added to the 
motion compensated output 21 , to from the reconstructed block 30. The reconstructed block 30, is then placed in the 
Local Decoded Frame Memory 12, for the motion compensation of the subsequent frames. 

20 After the P-block have been locally reconstructed, the information is passed again to the Motion Compensation 
Module 6, where the prediction of the B-block is formed Figure 5 shows a more detail functional diagram for the B-block 
prediction process. The P-motion vector derived in the Motion Estimation module 51 , Is passed through line 57 to the 
Motion Vector Scaling Module 53. Here the forward and backward motion vectors of the B-block is derived using the 
formula (1) and (2), respectively In the present embodiment, an addional motion search around these vectors is per- 

25 formed in the Delta Motion Search module 54, to obtain the delta motion vector. In this embodiment the motion vector 
is obtained by performing the search for all delta motion vector values between -3 and 3. The delta motion vector value 
that gives the best prediction in terms of the smallest mean absolute difference In the pixel values of the B-block and 
the prediction block is chosen. The prediction is formed in the Bi-directional Motion Compensation module 55, accord- 
ing to Figure 2b using the information from the Local Decoded Frame Memory 52, and the Current Reconstructed P- 

30 block 50. In the bi-directional prediction, only information available in the corresponding P-block is used to predict the 
B-block The average of the P-block infomation and the information from the Local Decoded Frame is used to predict the 
B-block. The rest of the B-block is predicted using information from the Local Decoded Frame only. 

The prediction differece block is then passed through line 22 to the DCT module 7. The DCT coefficients are then 
are then passed through line 23 to the Quantization module 8. The result of the Quantization module 8, is passed 

35 through line 24 to the Run-length & Variable Length Coding 9. In this module the presence of the delta motion vector 
and the quantized residual error in the Output Bitstream 25, is indicated a variable length code, NOB which is the acro- 
nym for No B-block. This flag Is generated in Run-lemgth & Variable Length Coding module 9 based on whether there 
are residual error in the Quantization module 8 and delta motion vectors found In the Delta Motion Search module 54 
is not zero. Table 1 provides the preferred embodiment of the variable length code for the NOB flag. The variable length 

40 code of the NOB flag is inserted in the Output Bitstream, 25, prior to the delta motion vector and quantized residual error 
codes. 



Table 1 



(Variable length code for the NOB flag) 


NOB 


Quantized Residual 
Error Coded 


Delta Motion Vectors 
Coded 


0 


No 


No 


10 


No 


Yes 


110 


Yes 


No 


111 


Yes 


Yes 



55 

Figure 6 shows the functional block diagram for the decoder. The Input Bit Stream 31 , is passed to the Variable 
Length & Run Length Decoding module 32. The block and side information are extracted in this module. If the frame is 
a PB-frame then the bitstream is checked if any delta motion vector and/or quantized residual error coefficients present. 
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The output of the module 32, is passed through line 37 to the Inverse Quantisation module 33. The output of the Inverse 
Quantization 33, is then passed through line 38 to the Inverse DOT module 34. Here the coefficients are transformed 
back into the pixel values. 

If the current frame is an l-frame then the output of Inverse DCT 34, is passed through line 39 and stored in the 
Frame Memory 42. 

If the current frame is a PB-frame, the side information containing the motion vectors are passed through line 45 to 
the Motion Compensation module 36. The motion Compensation module 36, uses this information and the information 
in the Local Decoded Memory, 35, to from the motion compensated signal, 44. This signal is then added to the output 
of the Inverse DCT module 34, to form the reconstruction of the P-block 

The Motion Compensation module 36, then uses the additional information obtained in the reconstructed P-block 
to obtain the bi-directional prediction for the B-block. The B-block is then reconstructed and placed in the Frame Mem- 
ory, 42, together with the P-block. 

By implementing this invention, the temporal frame rate of the decoded sequences can be effectively doubled at a 
fraction of the expected cost in bit rate. The delay is similar to that of the same sequence decoded at half the frame rate. 

As descrived above in the present invention a new predictive coding is used to increase the temporal frame rate 
and coding efficiency without introducing excessive delay. Currently the motion vector for the blocks in the bi-direction- 
ally predicted frame is derived from the motion vector of the corresponding block in the forward predicted frame using 
a linear motion model. This however is not effective when the motion in the image sequence is not linear. According to 
this invention, the efficiency of this method can be further improved if a non-linear motion model is used. In this model 
a delta motion vector is added to or subtracted from the derived fonn^ard and backward motion vector, respectively The 
encoder performs an additional search to determine if there is a need for the delta motion vector. The presence of this 
delta motion vector in the transmitted bitstream is signalled to the decoder which then takes the appropriate action to 
make use of the delta motion vector to derive the effective forward and backward motion vectors for the bi-directionally 
predicted block. 

Claims 

1 . A method for encoding a sequence of video image frames comprising the steps of: 

dividing a source sequence into a set of group of pictures , each group of pictures comprising a first frame, 
hereafter referred to as l-frame. followed by a plurality of pairs of predictively encoded frames, hereafter 
referred to as PB-frames ; 

dividing each l-frame or PB-frame into spatially non-overlapping blocks of pixel data; 

encoding the blocks from the said l-frame, hereafter referred to as the l-blocks, independently from any other 

frames in the group of pictures; 

predictively encoding the blocks from the second frame of the said PB-frame, hereafter referred to as the P- 
blocks, based on the l-blocks in the previous l-frame or the P-blocks in the previous PB-frame; 
bi-directionally predictively encoding the blocks from the first frame of the said PB-frame, hereafter referred to 
as the B-blocks, based on the l-blocks in the previous l-frame or the P-blocks in the previous PB-frame and the 
corresponding P-block in the current PB-frame; 

deriving forward and backward motion vectors for the said B-block by scaling the motion vector of the corre- 
sponding P-block in the current PB-frame; 

obtaining a final forward motion vector by adding a delta motion vector to the said scaled fonward motion vector; 
and 

obtaining a final backward motion vector by subtracting the same delta motion vector from the said scaled 
backward motion vector. 

2. A method for encoding a sequence of video image frames according to claim 1 , wherein 

the scaling of the motion vector is based on the tenporal reference of the first and second frames in the said 
PB-frame; 

3. A method for encoding a sequence of video image frames according to claim 1 , wherein the said encoding output 
is a bitstream comprising of: 

the temporal reference information for the first and second frames of the said PB-frames; 

the motion vector information for the said P-blocks; 

the quantized residual error information for the said P-blocks; 

the delta motion vector information for the said B-blocks; and 

the quantized residual en^or information for the said B-blocks. 
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4. A method for encoding a sequence of video image frames producing an encoded output bitstream according to 
claim 3, wherein 

the said outoput bitstream contains an additional information to indicate the presennce of: 



the delta motion vector information for the said B-blocks ; and / or 
the quantized residual error information for the said B-blocks. 

5. A method for decoding a sequence of video image frames comprising the steps of: 



10 decoding the compressed video image sequence as a set of group of pictures , each group of pictures com- 

prising an l-frame followed by a plurality of PB-frames; 

decoding each l-frame or PB-frame in spatially non-overlapping blocks of pixel data; 
decoding the l-blocks independently from any other frames in the group of pictures; 

predictively decoding the P-blocks based on the l-blocks in the previous l-frame or the P-blocks in the previous 
15 PB-frame; 

bi-directionally predictively decoding the B-blocks based on the l-blocks in the previous l-frame or the P-blods 
in the previous PB-frame and the corresponding P-block in the current PB-frame ; 

deriving a forward and backward motion vector information for the said B-block by scaling the motion vector 
information of the corresponding P-block in the current PB-frame ; 
20 obtaining final Ibnward motion vector by adding a delta motion vector to the said scaled forward motion vector 

; and 

obtaining final backward motion vector by subtracting the same delta motion vector to the said scaled back- 
ward motion vector. 



25 6. A method for decoding a sequence of video image frames according to claim 5, wherein 

the decoder receives a bitstream comprising of: 



the temporal reference information for the first and second frames of the said PB-frames; 
the motion vector information for the said P-blocks; 
30 the quantized residual error information for the said P-blocks; 

the delta motion vector information for the said B-blocks; and 
the quantized residual error information for the said B-blocks. 

7. A method for decoding a sequence of video image frames according to claim 5 from a bitstream according to claim 
35 6, wherein 

the said bitstream contains the additional information to indicate the presence of: 



the delta motion vector information for the said B-blocks; and / or 
the quantized residual error information for the said B-blocks. 

40 

8. A method of decoding a sequence of video image frames according to claim 5, wherein 

the scaling is based on the temporal reference of the first and second frames of the PB-frame. 



9. An apparatus for encoding a sequence of video image frames comprising: 

45 

means of encoding each frame in a sequence of video image frames into a set of group of pictures , each group 
of pictures comprising an l-frame followed by a plurality of PB-frames; 

means of dividing the l-frame and the PB-frame into spatially non-overlapping blocks of pixel data; 
means of encoding and decoding the l-blocks independently from any other frames in the group of pictures; 
50 means of storing the said decoded l-blocks to predictively encode subsequent frames; 

means of predictively encoding and decoding the P-blocks based on the l-blocks in the previous l-frame or the 
P-blocks in the previous PB-frame; 

means of storing the said decoded P-blocks to predictively encode subsequent frames; 
means of derving a forward and backward motion vector information for the said B-block by scaling the motion 
55 vector information of the corresponding P-block in the current PB-frame ; 

means of obtaining final fonward motion vector by adding a delta motion vector to the said scaled fonward 
motion vector ; 

means of obtaining final backward motion vector by subtracting the same delta motion vector to the said scaled 
backward motion vector ; and 
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means of encoding the B-blocks based on the l-blocks in the previous l-frame or the P-blocks in the previous 
PB-frame and the corresponding P-block in the current PB-frame using the above said final motion vectors. 

10. An apparatus for decoding a sequence of video image frames comprising: 

means of decoding the l-blocks independently of any other frames in the group of pictures; 

means of storing the said decoded l-blocks to predictively decode subsequent frames; 

means of decoding the P-blocks based on the l-blocks in the prev ious l-frame or the P-blocks in the previous 

PB-frame; 

means of storing the said decoded P-blocks to predictively decode subsequent frames; 

means of derving a forward and backward motion vector information for the said B-block by scaling the motion 

vector information of the corresponding P-block in the current PB-frame ; 

means of obtaining final fonward motion vector by adding a delta motion vector to the said scaled forward 
motion vector ; 

means of obtaining final backward motion vector by subtracting the same delta motion vector to the said scaled 
backward motion vector ; and 

means of decoding the B-blocks based on the l-blocks in the previous l-frame of the P-blocks in the previous 
PB-frame and the corresponding P-Block in the current PB-frame using the above said final motion vectors. 
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Figure la 




Prior Art (H.261) 



Figure lb 




Prior Art (MPEG) 
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Figure 2a 
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Figure 3a 
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Figure 4 
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Figure 5 
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Figure 6 
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